Monitoring Distributed Streams using Convex Decompositions (VLDB 2015)

نویسندگان

  • Arnon Lazerson
  • Izchak Sharfman
  • Daniel Keren
  • Assaf Schuster
  • Minos Garofalakis
  • Vasilis Samoladas
چکیده

Emerging large-scale monitoring applications rely on continuous tracking of complex data-analysis queries over collections of massive, physically-distributed data streams. Thus, in addition to the spaceand time-efficiency requirements of conventional stream processing (at each remote monitor site), effective solutions also need to guarantee communication efficiency (over the underlying communication network). The complexity of the monitored query adds to the difficulty of the problem — this is especially true for nonlinear queries (e.g., joins), where no obvious solutions exist for distributing the monitored condition across sites. The recently proposed geometric method, based on the notion of covering spheres, offers a generic methodology for splitting an arbitrary (non-linear) global condition into a collection of local site constraints, and has been applied to massive distributed stream-monitoring tasks, achieving state-of-the-art performance. In this paper, we present a far more general geometric approach, based on the convex decomposition of an appropriate subset of the domain of the monitoring query, and formally prove that it is always guaranteed to perform at least as good as the covering spheres method. We analyze our approach and demonstrate its effectiveness for the important case of sketchbased approximate tracking for norm, range-aggregate, and joinaggregate queries, which have numerous applications in streaming data analysis. Experimental results on real-life data streams verify the superiority of our approach in practical settings, showing that it substantially outperforms the covering spheres method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Monitoring Distributed Streams using Convex Decompositions

Emerging large-scale monitoring applications rely on continuous tracking of complex data-analysis queries over collections of massive, physically-distributed data streams. Thus, in addition to the spaceand time-efficiency requirements of conventional stream processing (at each remote monitor site), effective solutions also need to guarantee communication efficiency (over the underlying communic...

متن کامل

Feature Selection over Distributed Data Streams through Convex Optimization

Monitoring data streams in a distributed system has attracted considerable interest in recent years. The task of feature selection (e.g., by monitoring the information gain of various features) requires a very high communication overhead when addressed using straightforward centralized algorithms. While most of the existing algorithms deal with monitoring simple aggregated values such as freque...

متن کامل

Approximate Geometric Query Tracking over Distributed Streams

Effective Big Data analytics pose several difficult challenges for modern data management architectures. One key such challenge arises from the naturally streaming nature of big data, which mandates efficient algorithms for querying and analyzing massive, continuous data streams (that is, data that is seen only once and in a fixed order) with limited memory and CPU-time resources. Such streams ...

متن کامل

Challenges and Experience in Prototyping a Multi-Modal Stream Analytic and Monitoring Application on System S

In this paper, we describe the challenges of prototyping a reference application on System S, a distributed stream processing middleware under development at IBM Research. With a large number of stream PEs (Processing Elements) implementing various stream analytic algorithms, running on a large-scale, distributed cluster of nodes, and collaboratively digesting several multi-modal source streams...

متن کامل

Sketching Streams Through the Net: Distributed Approximate Query Tracking

Emerging large-scale monitoring applications require continuous tracking of complex dataanalysis queries over collections of physicallydistributed streams. Effective solutions have to be simultaneously space/time efficient (at each remote monitor site), communication efficient (across the underlying communication network), and provide continuous, guaranteed-quality approximate query answers. In...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014